Accurate Reconstruction of Microbial Strains from Metagenomic
نویسندگان
چکیده
Exploring the genetic diversity of microbes within the environment through metagenomic 5 sequencing first requires classifying these reads into taxonomic groups. Current methods compare these 6 sequencing data with existing biased and limited reference databases. Several recent evaluation studies 7 demonstrate that current methods either lack sufficient sensitivity for species-level assignments or suffer 8 from false positives, overestimating the number of species in the metagenome. Both are especially prob9 lematic for the identification of low-abundance microbial species, e. g. detecting pathogens in ancient 10 metagenomic samples. We present a new method, SPARSE, which improves taxonomic assignments 11 of metagenomic reads. SPARSE balances existing biased reference databases by grouping reference 12 genomes into similarity-based hierarchical clusters, implemented as an efficient incremental data struc13 ture. SPARSE assigns reads to these clusters using a probabilistic model, which specifically penalizes 14 non-specific mappings of reads from unknown sources and hence reduces false-positive assignments. 15 Our evaluation on simulated datasets from two recent evaluation studies demonstrated the improved 16 precision of SPARSE in comparison to other methods for species-level classification. In a third simula17 tion, our method successfully differentiated multiple co-existing Escherichia coli strains from the same 18 sample. In real archaeological datasets, SPARSE identified ancient pathogens with ≤ 0.02% abundance, 19 consistent with published findings that required additional sequencing data. In these datasets, other 20 methods either missed targeted pathogens or reported non-existent ones. 21 SPARSE and all evaluation scripts are available at https://github.com/zheminzhou/SPARSE. 22
منابع مشابه
Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes.
Metagenomics, the application of shotgun sequencing, facilitates the reconstruction of the genomes of individual species from natural environments. A major challenge in the genome recovery domain is to agglomerate or 'bin' sequences assembled from metagenomic reads into individual groups. Metagenomic binning without consideration of reference sequences enables the comprehensive discovery of new...
متن کاملProbabilistic Inference of Biochemical Reactions in Microbial Communities from Metagenomic Sequences
Shotgun metagenomics has been applied to the studies of the functionality of various microbial communities. As a critical analysis step in these studies, biological pathways are reconstructed based on the genes predicted from metagenomic shotgun sequences. Pathway reconstruction provides insights into the functionality of a microbial community and can be used for comparing multiple microbial co...
متن کاملGenometa - A Fast and Accurate Classifier for Short Metagenomic Shotgun Reads
UNLABELLED Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program th...
متن کاملGlobal metagenomic survey reveals a new bacterial candidate phylum in geothermal springs.
Analysis of the increasing wealth of metagenomic data collected from diverse environments can lead to the discovery of novel branches on the tree of life. Here we analyse 5.2 Tb of metagenomic data collected globally to discover a novel bacterial phylum ('Candidatus Kryptonia') found exclusively in high-temperature pH-neutral geothermal springs. This lineage had remained hidden as a taxonomic '...
متن کاملReprDB and panDB: minimalist databases with maximal microbial representation
BACKGROUND Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tools. Moreover, computational tools to rapidly compile and update such databases to accommodate the r...
متن کامل